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An expansive functionality and complexity has been ascribed to the majority of the human genome that was un- 
anticipated at the outset of the draft sequence and assembly a decade ago. We are now faced with the challenge of 
integrating and interpreting this complexity in order to achieve a coherent view of genome biology. We argue that the 
linear representation of the genome exacerbates this complexity and an understanding of its three-dimensional structure 
is central to interpreting the regulatory and transcriptional architecture of the genome. Chromatin conformation capture 
techniques and high-resolution microscopy have afforded an emergent global view of genome structure within the nu- 
cleus. Chromosomes fold into complex, territorialized three-dimensional domains in concert with specialized subnuclear 
bodies that harbor concentrations of transcription and splicing machinery. The signature of these folds is retained within 
the layered regulatory landscapes annotated by chromatin immunoprecipitation, and we propose that genome contacts 
are reflected in the organization and expression of interweaved networks of overlapping coding and noncoding tran- 
scripts. This pervasive impact of genome structure favors a preeminent role for the nucleoskeleton and RNA in regulating 
gene expression by organizing these folds and contacts. Accordingly, we propose that the local and global three-di- 
mensional structure of the genome provides a consistent, integrated, and intuitive framework for interpreting and un- 
derstanding the regulatory and transcriptional complexity of the human genome. 



It is testament to the rapid advances achieved in genome research 
that our conception of the human genome has changed dramati- 
cally since the publication of the first draft assembly over a decade 
ago (International Human Genome Sequencing Consortium 2001; 
Venter et al. 2001). At that time, our interpretation of the human 
genome was largely focused on the —1% protein-coding fraction 
that was interspersed across vast and largely uncharacterized in- 
tergenic noncoding regions. Aided by the advent of increasingly 
cheap high-throughput sequencing technologies, the genome has 
been rapidly annotated with detailed regulatory landmarks and 
transcriptional maps, revealing a complex array of overlapping 
and interlacing transcripts and a layered terrain of open and closed 
chromatin, diverse histone modifications, nucleotide modifica- 
tions, and transcription factor occupancies (The ENCODE Project 
Consortium 2012). These overlapping layers act in concert, and in 
combination encompass the majority of the genome, comprising a 
vast landscape whose detail and rich complexity was unantici- 
pated at the outset of the human genome project. 

We are now faced with the task of interpreting this huge 
catalog of data in an integrated and systematic manner. Here, we 
argue that this interpretation can be achieved by reference to the 
three-dimensional folding of the genome in the nucleus. We argue 
that, despite its value, the current one-dimensional representation 
impairs an intuitive understanding of the genome, and that many 
current regulatory maps intrinsically reflect, indeed retain, the 
signatures of its higher order structure, which in turn has an 
overbearing role in the organization and architecture of genes and 
in regulating gene expression. Therefore, achieving a detailed and 
accurate three-dimensional representation of the genome within 
the nucleus has emerged as one of the major goals currently facing 
the field of genomic research. 
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Rendering the regulatory landscape in three 
dimensions 

The human genome sequence exposed vast non-protein-coding 
regions that are replete with responsive and cell-specific regulatory 
elements (Thurman et al. 2012). Chromatin immunoprecipitation 
(ChIP) has been an invaluable technique for surveying these re- 
gions and is now widely used to identify transcription factor 
binding sites and chromatin modifications (Landt et al. 2012). The 
first genome-wide application of ChIP revealed an intricate land- 
scape containing an unexpectedly large number of transcription 
factor binding sites across chromosomes 21/22, often in regions 
distal to gene promoters (Cawley et al. 2004). However, many of 
these promiscuous sites are cross-linked at low levels, and similar 
sites fail to drive patterned reporter gene expression when sys- 
tematically assayed in Drosophila (Fisher et al. 2012). Notably, 
many of these sites also do not contain corresponding tran- 
scription factor sequence recognition motifs, and a further sub- 
set, termed "transcription factor hotspots," exhibit simultaneous 
overlapping signals to numerous transcription factors (Fig. 1; 
Moorman et al. 2006; Roy et al. 2010; Neph et al. 2012). 

Rather than bona fide sites of transcription factor binding, 
these promiscuous sites may reflect an artifactual enrichment 
resulting from proximal nonspecific cross-linking between con- 
tacts within a tightly folded genome structure. During the initial 
step of the ChlP-seq protocol, formaldehyde is used to cross-link 
occupied DNA and bound proteins, which are then immunopre- 
cipitated by antibodies against the transcription factor of interest 
and digested to yield the occupied DNA for sequencing. However, 
the initial formaldehyde cross-linking can also nonspecifically link 
DNA sequences that are not bound, but rather in close spatial 
proximity to proteins, resulting in the parallel, collateral pre- 
cipitation of juxtaposed genomic regions, potentially explaining 
the lack of binding nucleotide motifs within many ChlP-seq en- 
richments (Fig. 1A). Similarly, targeted proteins may be constitu- 
ents of larger multiprotein complexes. As a result, fixation with 
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Figure 1. Examples of proximal enrichments resulting from ChlP-seq. (A) ChlP-seq of a transcription factor (green) results in immunoprecipitation of 
bound DNA sequence (blue) as well as addition of DNA sequence (orange) in close proximity. Only bound sequence shows evidence of DNase I footprint 
and binding motif. (B) Immunoprecipitation of DNA sequence associated with large multiprotein complex results in artifactual indirect enrichments for 
a wide range of transcription factors. (C) Active enhancers exhibit a range of ChlP-seq enrichments as a result of a close spatial proximity to histone 
modification and transcription factors at promoters. 



formaldehyde would immunoprecipitate the entire multiprotein 
complex, including any sequences bound by intermediate protein 
partners, resulting in a single sequence exhibiting a simultaneous 
enrichment for the full range of transcription factors within the 
complex, providing a potential interpretation for the existence 
of transcription factors hotspots (Fig. IB). 

While these scenarios argue for the careful interpretation of 
signal enrichments within ChlP-seq libraries, they also suggest 
that ChlP-seq libraries retain information on the three-dimensional 
folding of the genome and its interaction with protein structures. 
Indeed, this prospect forms the basis for the chromatin inter- 
action analysis by paired-end tag sequencing (ChlA-PET) ap- 
proach (Fullwood et al. 2009). ChlA-PETuses the same protocol 
as ChlP-seq, including the initial formaldehyde cross-linking 
and immunoprecipitation of targeted protein, but with the ad- 
dition of a ligation step that joins coprecipitating DNA sequences 
before sequencing, thereby discerning those regions of the ge- 
nomic sequences that copurify due to close proximity. For ex- 
ample, utilizing ChlA-PET shows not only the residence of the 
Ser2-hypophosphorylated form of RNA polymerase II at human 
gene promoters, but also the aggregation of these gene promoters 
into higher-order networks of coregulated and cotranscribed genes 
(Li et al. 2012). Similarly, ChlA-PET targeting H3K4me2 modifi- 
cations is able to delineate interactions between promoters and 
distal enhancers (Chepelev et al. 2012). A comparison of ChlP-seq 
and matched ChlA-PET libraries reveals the extent to which nu- 
merous ChlP-seq sites may be parsimoniously resolved as alternative 
contacts with a common transcription factor. 

Immunofluorescent microscopy using matched antibodies 
directly illustrates the structural information implicit within ChlP- 
seq libraries, visualizing the subnuclear distribution of transcrip- 
tion factors, histone modifications, and specialized subnuclear 
structures (Mao et al. 2011). Transcription machinery and factors 
are not uniformly diffused throughout the nucleus but coalesce as 
distinct and discrete foci, and histone modification often form 
broad nuclear domains, such as the aggregation ofH3K9 methyl- 
ated regions to the nucleus periphery (Bartova et al. 2008). These 
nuclear domains are not obvious when matched ChlP-seq libraries 
are aligned to the genome sequence. For example, the H3K27me3 
domains and sites of polycomb complex occupancy that occur 



concurrently at Hox gene clusters that are dispersed across the 
Drosophila genome, in fact reflect the convergent localization of 
these distal Hox loci to common Polycomb bodies within the nu- 
cleus (Cheutin and Cavalli 2012; Sexton et al. 2012; Towbin et al. 
2012). This suggests that the complexity apparent within our 
current linear representation of the regulatory landscape may be 
interpreted as the complex folding of the genome around common 
subnuclear structures, and a more judicious understanding of 
ChlP-seq libraries could be achieved with reference to three- 
dimensional genome structure. 

Resolving genome folding 

Chromatin conformation capture techniques are the main current 
approach by which to infer three-dimensional genome structure 
(de Wit and de Laat 2012). These techniques also use formalde- 
hyde-mediated cross-linking to resolve contact between genomic 
loci, followed by restriction enzyme digestion to extract cross- 
linked fragments from the chromatin. Digested termini undergo 
proximal ligation to form intramolecular fragments that can be used 
to measure the population-averaged frequency of interactions be- 
tween two genomic regions (Dekker et al. 2002). This technique has 
been instrumental in determining significant and stable interactions 
between two genomic loci, such as the close physical interaction 
between the locus control regions and active globin genes that loop 
out —40-60 kb of intervening sequence (Tolhuis et al. 2002). 

The global three-dimensional structure of the genome can be 
inferred from techniques, such as HiC, that combine chromatin 
conformation capture with sequencing (Lieberman-Aiden et al. 
2009). These studies support the adoption of a fractal-globule or- 
ganization that enables the ready extrication and decondensation 
of the genome (Bancaud et al. 2012), as well as the organization of 
chromosomes into distinct radially organized subnuclear territo- 
ries that were previously visualized by fluorescent in situ hybrid- 
ization (Bolzer et al. 2005). These territories are further divided into 
gene-rich domains that extend away from the nuclear periphery 
and are sites of active gene expression and early replication, with 
the reciprocal exclusion of gene-poor regions that encompass 
a compact repressive late-replicating heterochromatin fraction 
(Simonis et al. 2006; Boyle et al. 2011; Kalhor et al. 2012). 
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As chromatin conformation capture has achieved higher 
resolution, smaller structural units, known as topologically asso- 
ciated domains (TADS) have been detected (Dixon et al. 2012; Nora 
et al. 2012). These mega-base-sized successive domains partition 
the genome into local, distinct, and introverted folded regions 
linked by intervening unfolded regions. Although contacts within 
these domains are dynamic, the borders of these domains are re- 
markably conserved during differentiation and between cell types, 
and seem to impose an intrinsic modular architecture to the ge- 
nome. These topological domains also exhibit a close concor- 
dance to transcription factor occupancy and epigenetic domains, 
including large blocks of H3K27me3 and H3K9me2 repression 
(Lan et al. 2012; Shen et al. 2012). This correlation may reflect 
the common measurement by alternative ChlP-seq and HiC ap- 
proaches of the genome folding around a distinct subnuclear do- 
main, where TAD formation may delimit these segmental chro- 
matin blocks (Nora et al. 2012). 

Transcription at factories 

The concept of transcription factories was first proposed in re- 
sponse to the clustering transcription factors as distinct and dis- 
crete foci within the nucleus (Jackson et al. 1993; Wansink et al. 
1993). Transcription factories comprise large subnuclear assem- 
blies that encompass a range of transcription factors and ma- 
chinery constituents along with additional accessory proteins for 
RNA processing and splicing (Jackson et al. 1993; Melnik et al. 
2011; Edelman and Fraser 2012). A highly specialized example of 
a transcription factory is provided by the nucleolus, a subnuclear 
organelle responsible for rDNA transcription that harbors the 
dedicated machinery required for the ribosomal RNA transcrip- 
tion, elongation, and maturation (Hernandez- Verdun et al. 2010). 
Over 2000 clustered rRNA copies dispersed over five chromosomes 
are recruited together to the nucleolus, where they are cotran- 
scribed on the surface of the fibrillar center within the nucleolus 
(Nemeth and Langst 2011). 

RNA polymerase I and II-dependent transcription has also 
been associated with similar centralized structures, with electron 
spectroscopic imaging visualizing a porous heterogeneous protein- 
rich core, with nascent transcription preceding on the surface 
(Eskiw et al. 2008). Emerging evidence suggests that active RNA 
polymerase II is commonly bound to the surface of transcription 
factories (Papantonis and Cook 2011). The use of fluorescent in 
situ hybridization to register the relative movement of gene loci 
and nascent transcripts during the transcription cycle shows the 
DNA sequence tracking through RNA polymerase II complexes 
that themselves remain immobile with reference to the transcrip- 
tion factory (Papantonis et al. 2010). However, the generality of 
this model is not yet resolved with, for example, microscopy of the 
Hsp70 loci in Drosophila polytene chromosomes providing con- 
flicting evidence for a classical model of polymerase II recruitment 
(Yao et al. 2007). 

The immobilization of numerous active RNA polymerase II 
complexes to a single specialized active compartment affords the 
coexpression of multiple genes (Zhou et al. 2006). Erythroid genes, 
located at distal sites across the genome, accrue at common tran- 
scription factories when transcriptionally active, with silent genes 
being excluded (Schoenf elder et al. 2010). These common com- 
partments where the erythroid genes unite also appear specialized, 
harboring specific transcription factors, such as KLF1, relevant to 
erythroid gene expression. Similarly, the STAT transcription factor 
anchors coregulated genes to common compartments during the 



nuclear reorganization that accompanies T-cell differentiation (Hakim 
et al. 2013). This aggregation of multiple genes to specialized tran- 
scription factories with varying and specific regulatory components 
may be responsible for the correct and coordinated expression of 
distinct gene ontologies. Indeed, following transfection, mini- 
chromosomes cluster to different transcription factories according 
to the promoters and introns they contain (Xu and Cook 2008). 

The activation of a range of genes, including the Myc and 
globin genes and the collinear activation of Hox genes (Osborne 
et al. 2004, 2007; Morey et al. 2009; Schoenfelder et al. 2010) is 
coincident with their nuclear relocation. The potential for this 
relocation to target genes to pre-assembled transcriptional com- 
partments offers an alternative to the classical model of transcription 
factor recruitment. Although yet to be realized, this alternative 
model switches our point of reference from the linear genome 
being the central structure upon which transcription factors as- 
sociate de novo to a three-dimensional genome that dynamically 
traffics genes or promoters to a central scaffold of pre-assembled 
transcriptional complexes (Cook 2010; Edelman and Fraser 2012). 
Nevertheless, such movement would be restricted within the 
confines of the genomes' global architecture. Live cell imaging 
shows that the movement of gene loci is constrained to a tight 
volume within the nucleus (Strickfaden et al. 2010), and ligand- 
induced changes to gene expression that include rapid and global 
transcriptional changes expand the interactions between genomic 
regions, but do not incur the major reorganization of chromo- 
somes (Hakim et al. 2011). 

An overarching nucleoskeleton 

The local folding of enhancers to genes and genes to transcription 
factories promotes the topography of the genome into an over- 
arching regulatory role. Intranuclear order, including the structure 
and movement of the genome, is organized by a dense, filamen- 
tous nucleoskeleton (Simon and Wilson 2011). Many proteins of 
the nucleoskeleton, including lamins, titin, actin, myosins, and 
kinesins associate with DNA, histones, chromatin modifying pro- 
teins, transcription factors, and the general transcriptional machin- 
ery (de Lanerolle and Serebryannyy 2011). Actin comprises a major 
component of the nucleoskeleton, of chromatin remodeling com- 
plexes, and enhances transcription by interaction with promoter 
and coding sequences, the RNA polymerase I— III complexes, and 
other RNA processing proteins (Hofmann et al. 2004). Similarly, 
specialized nuclear-localized myosins and kinesins are molecular 
motors that traffic cargo over long ranges along actin or microtu- 
bule filaments to transcriptional machinery at active genes (Pestic- 
Dragovich et al. 2000; Fomproix and Percipalle 2004; Chuang et al. 
2006) and, in the case of Myosin 5a, to S3 5 speckles that harbor 
splicing factors (Pranchevicius et al. 2008). 

Genome folding relies on the nucleoskeleton. Large-scale 
chromosomal repositioning in response to serum starvation is 
rapid, requires energy, and is dependent on active nuclear motor 
complexes (Mehta et al. 2010). The nucleoskeleton may also direct 
the traction of genes to nuclear bodies such as transcription fac- 
tories. Following induction by a transcriptional activator, migra- 
tion of chromosomal loci from the nuclear periphery is perturbed 
in actin and myosin mutants (Chuang et al. 2006). Furthermore, 
the collinear induction of HOXB gene expression is actin dependent 
(Ferrai et al. 2009), and the recruitment of snRNA genes to Cajal 
bodies, spherical subnuclear organelles that specialize in snRNP 
biogenesis, requires actin and myosin (Dundr et al. 2007). Simi- 
larly, both actin and myosin play a primary role in recruiting rDNA 
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clusters to the nucleolus in response to the requirements of cellular 
growth and differentiation (Philimonenko et al. 2004). This range 
of transcription factories that are dependent on actin and myosin 
anticipates a broad and preeminent role for the nucleoskeleton in 
organizing genome folding and gene expression. 

Complex networks of transcription 

The complexity and sheer size of the transcriptional landscape is 
surely one of the most significant findings to emerge since the 
publication of the human genome. Given that the signature of 
genome structure is written into the regulatory landscape, we ar- 
gue it is likely that this signature is similarly written into the 
transcriptional landscape. Initial cDNA sequencing and tiling array 
projects revealed that the transcription of protein-coding genes is 
accompanied by noncoding RNAs (Carninci et al. 2005; Kapranov 
et al. 2007a). Vast swaths of noncoding DNA are transcribed into 
short and long noncoding RNAs that are commensurate in di- 
versity and abundance with protein-coding genes, and have been 
increasingly accepted as legitimate gene products (Mercer et al. 
2009). Indeed, we have still yet to reach the frontiers of the tran- 
scriptome, with targeted RNA sequencing revealing further range 
and complexity of noncoding transcription in intergenic regions 
not otherwise detected by conventional RNA sequencing (Mercer 
et al. 2012). The profiling of additional tissues, developmental stages, 
and cell types continues to expand these limits and collectively 
ascribe a massive depth and breadth to the human transcriptome. 

Coding and noncoding genes are organized as incredibly 
complex networks of layered, interleaved, antisense, and over- 
lapping transcripts (Kapranov et al. 2005). This transcriptional 
complexity has revealed the modular design principles of the ge- 
nome, whereby a single sequence can be incorporated in numerous 
ways into a range of coding and noncoding, sense and antisense 
transcripts that overlap to form complex networks (Kapranov et al. 
2007b). In response to this recurrent complexity throughout the 
genome, we now consider the transcript as the basic unit of the 
transcriptome, with the concept of a gene being revised to a higher- 
order definition that encompasses a functionally related group of 
transcripts influencing a given phenotype (Mattick 2003; Gerstein 
et al. 2007; Gingeras 2007; Djebali et al. 2012). 

The folding of transcriptional complexity 

The immobilization of RNA polymerase to nuclear structures ties 
the complexity of transcriptional initiation and elongation to ge- 
nome structure. Recognition that splicing is a cotranscriptional 
process also provides an avenue by which genome structure can 
influence RNA processing. Therefore, we considered whether the 
modular design of the genome and its transcription and processing 
reflects and can be understood through the three-dimensional 
structure of the genome. 

Gene expression requires the combinatorial action of alter- 
native transcription initiation, splicing, and termination, with 
local chromatin loops communicating close coordination between 
these processes (Fig. 2; Tan- Wong et al. 2008; Moore and Proudfoot 
2009). Chromatin conformation capture routinely resolves a loop 
that forms across the gene body, localizing gene termini to the 
promoter and affording contact between transcription initiation 
and termination processes and coassembly of associated machin- 
ery (O'Sullivan et al. 2004; O'Reilly and Greaves 2007; Singh and 
Hampsey 2007; Tan- Wong et al. 2008; Moore and Proudfoot 2009). 
This interaction also restricts the divergent transcription of ncRNAs 




Figure 2. Formation of chromatin loops at gene loci permits co- 
ordination between processes of transcription initiation, termination, and 
splicing. Promoter and terminal regions of genes colocalize during tran- 
scription, forming a looped structure that enhances transcriptional di- 
rectionality. Gene loop formation depends on contacts between both 
promoter-associated transcription factors, such as TFIIB, within the pre- 
initiation complex and polyadenylation factors, such as Ssu72 and 
cleavage factor subunits, within the terminator complex. Extensive con- 
tacts between the spliceosome and the initiating and elongating poly- 
merase II complex also facilitate cotranscriptional splicing. 

and imposes directionality on the gene's promoter (Tan- Wong 
et al. 2012). Even further interactions between the RNA poly- 
merase II residing at the alternative promoters used by a gene are 
anticipated by ChlA-PET (Li et al. 2012). 

Multiple coding and noncoding transcripts are often in- 
terwoven into complex transcriptional networks (Carninci et al. 
2005). Genome folding permits these interwoven RNAs to exploit 
a common regulatory architecture. For example, local intragenic 
loops permit a single promoter complex to simultaneously drive 
transcription of both the SPI1 gene promoter and an antisense 
noncoding RNA that is, counterintuitively, hosted within a down- 
stream intron (Ebralidze et al. 2008). Further loops also bring en- 
hancer elements to bear on the promoter complex, resulting in the 
assembly of a higher order structure encompassing the loci. The 
folding of the genome into higher ordered structures that loop out 
of intervening regions can prevent confusion from overlapping 
genes and permit compartmentalized transcription for the distinct 
expression of intronic-hosted genes (Fig. 3). ChlA-PET analysis 
targeting RNA polymerase II indicates that the genome can fold 
together multiple overlapping transcripts to share common regu- 
latory features (Li et al. 2012). Such interleaved transcriptional 
networks, which seem complex in the linear representation of the 
genome, may be parsimoniously understood in the context of 
a three-dimensional genome. 

Splicing is increasingly recognized as a cotranscriptional 
process, and splicing machinery and regulators comprise a major 
component of transcription factories (Fig. 2; Melnik et al. 2011). 
Like transcription initiation, a number of observations anticipate 
that local genome topography can be organized with relation to 
the gene's internal intron and exon structure, with exons being 
localized to cognate transcriptional machinery with intervening 
introns looped out (Tan-Wong et al. 2008; Moabbi et al. 2012). 
CTCF, better known for organizing chromatin loops and structure 
in conjunction with cohesin, also occupies alternative exons to 
mediate exon inclusion (Shukla et al. 2011; Lee and Iyer 2012). 
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Figure 3. Three-dimensional interpretation (left) of regulatory and transcriptional complexity in one-dimensional genome representation (right). 
(A) The genome forms large complex clusters and introspective folded clusters with specialized transcription compartments. Each of these clusters 
correlates to a collection of transcripts and "background" ChlP-seq enrichment. (B) Within each cluster the genome is folded to associate with sub- 
nuclear structures containing transcription factors and machinery, splicing, and other accessory proteins. These associations coregulate genes to 
generate interleaved complex transcriptional networks of coding (blue) and noncoding transcripts (green). Proximal cross-linking with ChlP-seq results 
in a complex landscape of enrichment across loci that reflect the folded genome structure. (C) Within each gene, local dynamic chromatin folding 
determines the association of alternative promoters and local noncoding RNAs with a shared regulatory architecture, thereby mediating coregulated 
gene expression. 



Similarly, a range of histone modifications demarcate the intron 
and exon boundaries within an epigenetic landscape that is in- 
timately linked to genome structure (Luco et al. 2011; Kornblihtt 
et al. 2013). Such structural and epigenetic features could help 
direct the spliceosome to recognize correct splice sites across often 
vast intronic distances. 

The imprint of genome structure 
in the transcriptional landscape 

A longer, chromosome-wide perspective shows that these com- 
plex transcriptional networks cluster to form active transcriptional 
foci interspersed by quiescent regions (The FANTOM Consortium 
2005; Kapranov et al. 2007b). These active transcriptional foci may 
associate with a corresponding transcription factory, with the 
complex internal folding of topological domains around com- 
mon regulatory cores relating to the internal detail of tran- 
scriptional networks. Collectively, these transcriptional clusters 
crowd within the active nuclear compartment, with distinct 
knots of folded chromatin that comprise topological domains 
demarcating boundaries between developmentally regulated 
transcriptional hubs, with intervening regions replete with in- 
sulators, RNA polymerase I genes, and repetitive elements (Fig. 3; 
Dixon et al. 2012; Sexton et al. 2012). 

The folding of the genome within successive three-dimensional 
structures would impose constraints on the organization of 



encompassed genes. Transcriptional territories may partition ad- 
jacent groups of coexpressed genes in the genome (Caron et al. 
2001; Spellman and Rubin 2002). Such territories could create both 
specialized genome property with, for example, the majority of 
testes-expressed genes being tightly clustered within the Drosoph- 
ila genome (Boutanaev et al. 2002) and "valuable" genome prop- 
erty, with ubiquitously expressed genes clustering as the most 
gene-dense regions (Lercher et al. 2002). Clustering of coexpressed 
genes also inversely shapes the genomic distribution of transpos- 
able elements that space out intervening regions (Fontanillas et al. 
2007). 

This constraint that genome structure imposes on gene evo- 
lution is elegantly demonstrated in the collinear organization of 
Hox genes, critical developmental genes that evolved in the bi- 
lateral ancestor to regulate body plan. Hox genes undergo collinear 
activation in distinct overlapping domains according to the body 
axis of animal embryos (Mallo et al. 2010). This collinear tran- 
scriptional activation involves the sequential relocation of genes to 
an active structural compartment, while inactive Hox genes remain 
sequestered within a single repressive structure delimited from 
flanking regions (Noordermeer et al. 2011). Despite the duplica- 
tion, fragmentation, reduction, and expansion of Hox loci that has 
occurred and correlates with major morphological changes, the 
collinear order of Hox gene expression and the progressive re- 
location of genes to active transcriptional compartments has been 
maintained during evolution (Lemons and McGinnis 2006). 
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RNA can reciprocally shape nuclear structure 

The overarching role for the structure and dynamic movement of 
the genome in regulating transcription may be reciprocated by 
RNA on genome structure. Mature RNA is stably associated with 
the genome, comprising a major part of chromatin where it 
fullfills well-established epigenetic roles (Mondal et al. 2010). The 
capacity for sequence-specific interactions with protein makes 
RNA an ideal guide and/or scaffold for the nucleation and assembly 
of the large regulatory structures to which the genome folds. The 
IncRNA, NEAT1, is required for interchromatin paraspeckle for- 
mation (Clemson et al. 2009), and the MALAT1 IncRNA sequesters 
serine/arginine splicing factors to nuclear speckles (Tripathi et al. 
2010). Additional structures, including histone locus bodies, stress 
bodies, and other epigenetic bodies also require RNA for assembly 
(Shevtsov and Dundr 2011), anticipating a broader role for RNA in 
subnuclear organization. RNA can also mediate the trafficking of 
gene loci to subnuclear bodies, a key prediction of the alternative 
model of gene regulation. In response to growth signals, IncRNAs 
and associated chromatin modifying proteins relocate gene loci 
from repressive Polycomb bodies to the activating context of 
interchromatin granules, whereby gene expression is initiated 
(Yang et al. 2011). 

The looping of long-range regulatory enhancers brings regu- 
latory sequences and complexes into contact with promoters to 
regulate gene expression. In conjunction with this folding, en- 
hancers themselves are often bidirectionally transcribed as non- 
polyadenylated noncoding RNAs that are thought to contribute to 
the activation of genes targeted by the enhancer (Kim et al. 2010; 
Melo et al. 2012). Similarly, despite being retained at the site of 
transcription, the IncRNA HOTTIP recruits the WDR5/KMT2A (pre- 
viously MLL) complex to impart active modifications to multiple 
distal sites throughout the HOXA loci via chromatin looping 
(Wang et al. 2011). The abundance of IncRNAs and eRNAs orga- 
nized adjacent to developmental genes could similarly facilitate the 
tightly regulated local folding of these loci and their structural re- 
organization during development. 

A new representation of the human genome 

The linear representation of the genome enabled early efforts of 
gene mapping by classical genetic techniques of pedigree analysis, 
molecular techniques of physical mapping, and finally the as- 
sembly of the human genome sequence. Since this sequence was 
published, it has formed an invaluable reference to which genome- 
wide data has been aligned and interpreted. However, the ab- 
straction of the genome to a single dimension ignores the tight 
folding of the genome within the nucleus, and we are beginning to 
realize the limits of this linear representation and how it impairs an 
intuitive conception of the genome. We consider the determination 
and development of three-dimensional representation of the human 
genome to be one of the most significant challenges cunently facing 
genome biology. 

In recent years the tools and expertise have been developed 
that make a detailed and global description of genome topology 
feasible (de Wit and de Laat 2012). The integration of whole-ge- 
nome and targeted chromatin conformation capture approaches, 
along with ChlA-PET, ChlP-seq, immunofluorescent microscopy, 
and fluorescent in situ hybridization are required to construct and 
refine such a model. However, the size, complexity and dynamism 
of genome structure represents a major challenge to achieving 
these ambitions. 



In addition to its massive complexity, the genome is a highly 
dynamic structure. While relatively inert large-scale topodomains 
and nuclear structures apply constraints, the genome, particularly 
at a local level, is in continual and stochastic motion. It will be 
a major technical challenge to reproducibly resolve such dynamic 
features. Current chromatin conformation capture techniques pro- 
vide a population-averaged depiction of genome structure, affording 
the identification of recurrent, stable, and significant genome inter- 
actions whereas, in contrast, high-resolution single-cell microscopy 
can resolve individual chromatin interactions and identify dynamic 
genome folding. Nevertheless, despite the dynamism, size, com- 
plexity, and plasticity of the genome that confounds any easy 
determination of the genome structure, laudable efforts to tackle 
this challenge have already been initiated (Asbury et al. 2010; 
Marti-Renom and Mirny 2011). 

These technical challenges will also require accompanying 
novel visual solutions to render the dynamic genome in three di- 
mensions. A semi-schematic depiction of the genome's internal 
interaction circuitry may achieve a compromise between clarity 
and an accurate representation of detail and complexity. This map 
would have to incorporate and denote dynamic regions that un- 
dergo motion and may be recast in a cell-specific manner. 

Despite these challenges, achieving such a three-dimensional 
representation of the genome would provide an invaluable refer- 
ence for biologists. Aligning and analyzing functional genomic 
and transcriptional data within this spatial context could provide 
an integrated, consistent, and judicious basis for understanding 
the transcriptional and regulatory complexity that has emerged as 
a hallmark of the human genome. 
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